Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

of protein sequences with different backgrounds. Therefore,

es may exist between these mutation matrices. The problem of

utation matrix is the best for modelling a specific data set using

asis function thus needs careful consideration. Certainly, the use

ngle mutation matrix generates a typical set of bio-basis features.

of different mutation matrices thus generates different sets of bio-

ction features. Optimising these bio-basis features is of course an

t issue to make a protease cleavage discrimination model as

and as robust as possible. Therefore, one attempt was to use as

ailable mutation matrices as possible to model this factor Xa

cleavage data. A bio-DNN model employing ten mutation

shown in Table 3.11 was thus called a mixture bio-DNN model.

.36(b) shows the ROC curve of a mixture bio-DNN model

ed for the factor Xa protease cleavage data.

Ten mutation matrices used in the mixture bio-DNN model for factor Xa

eavage data.

Altschul

[Altschul, 1991]

Blosum62

[Henikoff and Henikoff, 1992]

Dayhoff

[Dayhoff, et al., 1978]

Gonnet

[Bernner, et al., 1994]

Grantham

[Grantham, 1974]

Henikoff

[Henikoff and Henikoff, 1992]

Johnson

[Feng, et al., 1984]

Jones

[Jones, et al., 1992]

Levin

[Levin, et al., 1986]

McLachlan

[Luthy, et al., 1991]

uctive learning

n objective of every prediction model is to indicate what will

n the future. Many machine learning algorithms are able to deliver

rediction model for a problem through a proper training process.

, it is very often to have such a question, why and how a

n is delivered. In many applications, the capability of interpreting

cted model and a prediction has been a desirable feature.